Skip to content

Conversation

@beicause
Copy link
Contributor

@beicause beicause commented Nov 14, 2025

Objective

The current OIT stores viewport-sized fragments per layer. It uses much more memory than it can be.

Solution

Implements per-pixel linked list for OIT, which saves memory and can handle more layers. The implementation references https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/api/oit_linked_lists

Testing

Tested with the order_independent_transparency example. I also added a new scene in it.

Details 屏幕截图_20251114_100337

@IceSentry IceSentry self-assigned this Nov 14, 2025
@IceSentry IceSentry self-requested a review November 14, 2025 04:32
@IceSentry IceSentry removed their assignment Nov 14, 2025
@IceSentry IceSentry added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen labels Nov 14, 2025
@IceSentry IceSentry added S-Needs-Review Needs reviewer attention (from anyone!) to move forward D-Shaders This code uses GPU shader languages labels Nov 14, 2025
Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Thank you so much for working on this. Sorry it took so long for me to review, I got sick in the same week you opened the PR and haven't had time to come back to it since.

This is very close to what I had in mind as a follow up to my original OIT impl so I'm really happy to see it in action.

I managed to review the PR because I'm very familiar with OIT but to make the diff a bit simpler to follow I would suggest adding the depth prepass support in a separate PR to the current OIT impl. This way the linked list PR will be a bit easier to follow since it won't be mixed with the depth prepass changes.

Add `reserve_internal` to `BufferVec`
Add `capacity` `set_label` `get_label` to `UninitBufferVec`
Use `Vec::reserve` to reduce some allocation
@goodartistscopy
Copy link
Contributor

goodartistscopy commented Jan 11, 2026

This is in pretty good shape I believe. If you're not in a hurry to merge, I'd like to try a possible improvement where, instead of pulling the fragments in an array first at resolve, we would iterate over the linked list (N times) and pop the closest fragment. That might sound bad at first because it's O(N²) accesses to a storage buffer (non contiguous at that).
However:

  • After the first pass, the fragments might reside in cache, making accesses not so bad (to be validated of course)
  • It eliminates the large array that might make the compiler allocate a ton of registers which hurts occupancy, or even spill to VRAM which is not good either.
  • If that works then the number of layers is truly unlimited per pixel, the only limit being the global budget of the allocated fragment buffer.

What do you think ?

@beicause
Copy link
Contributor Author

beicause commented Jan 12, 2026

From what I see in backwards memory allocation and register-based block sort papers (they are complex and not worth it for games typically have fewer transparency layers), I think there is reason to believe in-place sorting in ssbo is slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Feature A new feature, making something new possible D-Shaders This code uses GPU shader languages S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants